Providing content responsive to multimedia signals

ABSTRACT

A method of providing information including providing a communication session of at least one of audio and video media and applying automatic recognition to media transferred on the communication session. An advertisement is selected by a processor, based on the automatic recognition and non-advertisement information is selected by the processor, responsive to the automatic recognition. The selected advertisements and the selected nonadvertisement information, are presented during the communication session.

RELATED APPLICATIONS

This application is a continuation of U.S. patent application Ser. No.13/652,588 filed on Oct. 16, 2012, which is a continuation of U.S.patent application Ser. No. 12/403,539 filed on Mar. 13, 2009, now U.S.Pat. No. 8,341,665, which is a continuation of PCT Patent ApplicationNo. PCT/IL2007/01138 filed on Sep. 16, 2007. The contents of the aboveapplications are all incorporated by reference as if fully set forthherein in their entirety.

FIELD OF THE INVENTION

The present invention relates to pattern recognition systems.

BACKGROUND OF THE INVENTION

Speech recognition systems are used in various applications to allowpeople to provide speech commands to electronic devices, such astelephones and computers.

Speech recognition is also used for translation and transcription.

EP patent publication 1696338, published Aug. 30, 2006, titled“Individual Simultaneous Translator”, the disclosure of which isincorporated herein by reference, describes a telephone that servesadditionally as an automatic translator for speech in the surroundingsof the telephone, as well as speech received over the telephone line.The translation is sounded to the user instead of the original.

Similar translators are described in GB patent publication 2421815, toO'Donnel, and US patent publication 2002/0181669 to Takatori et al.,published Dec. 5, 2002, the disclosures of which are incorporated hereinby reference.

US patent publication 2006/0074623 to Tankhiwale, published Apr. 6,2006, the disclosure of which is incorporated herein by reference,describes using speech recognition for automatic real time transcriptionof voice conversations over a communication network.

Image recognition is used in various applications.

US patent publication 2007/0081744 to Gokturk et al., published Apr. 12,2007, the disclosure of which is incorporated herein by reference,describes a method of cataloging images based on image recognition.

Advertisement is a powerful method of enhancing business. In many casesit is desired to target advertisements to users having the most chancesof being interested in the advertised product.

U.S. patent publication 2005/0234779 to Chiu et al., published Oct. 20,2005, the disclosure of which is incorporated herein by reference,describes a method of targeting advertisements.

PCT patent publication WO 2007/026320 to Hemar Elad et al., publishedMar. 8, 2007, the disclosure of which is incorporated herein byreference, describes a computer client which identifies words in a callusing speech recognition or identifies the gender of a speaker andaccordingly targets advertisements. This publication also suggestsidentifying a tone, voice level or other emotion and controlling anavatar accordingly. The publication recognizes the problem of convincingpeople to allow targeted advertisement, and suggests that the callservice be supplied on condition that the advertisements are viewed.This, however, limits the provision of advertisements to services inwhich the advertisements can substantially subsidize the communicationcosts.

U.S. provisional patent application 60/765,743, filed Feb. 7, 2006, thedisclosure of which is incorporated herein by reference, describes usinga voice recognition engine to convert the content of a conversationbetween two parties into text, which is analyzed in order to selectadvertisements to be presented to users.

Not always, however, do these systems provide sufficiently targetedadvertisements.

Another, more general, problem with advertisements is that people tendto ignore them. Even in cases in which they receive cheap or freeservices which include advertisements, people tend to ignore theadvertisements. In addition, in some cases the services provided inexchange for receiving advertisements are too expensive, making theprovision of the service for free non-economical.

SUMMARY OF THE INVENTION

An aspect of some embodiments of the present invention relates to amedia (i.e., sound and/or video) communication device (e.g., mobilestation, computer with instant imaging software) which provides one ormore information services together with targeted advertisements, tousers conducting a communication session, based on speech recognition ofthe contents of the session. In some embodiments of the invention, theinformation services are provided without charge. Providing the userwith information services is an incentive for the user to allow thespeech recognition, and thus any antagonism to speech recognition foradvertisements may be alleviated.

In some embodiments of the invention, the one or more informationservices comprise text transcription and/or text translation of aconversation or dictation or portions thereof. Alternatively oradditionally, the one or more information services comprise dictionary,encyclopedia or other reference source information on words, phrases orgroups of words included in the session. In some embodiments of theinvention, a web search of a word or set of representative words fromthe communication session is provided. Further alternatively oradditionally, the information services include table information such astime table information of transportation services, opening hours ofshops or other services and personal contact information, such astelephone numbers, messaging nick names and/or email addresses. In anexemplary embodiment of the invention, the information services includeinformation on businesses in a specific area, such as supplied by theskype-find service (described athttp://www.skype.com/intl/en-gb/help/guides/skypefind.html). In someembodiments of the invention, the one or more information servicescomprise map displays of areas discussed in the communication session.In still other embodiments of the invention, the one or more informationservices include whether information, news, sports and/or other updatesrelated to keywords used in the communication session.

Optionally, the information is provided from a remote location, forexample from the Internet. Alternatively or additionally, theinformation is provided from a memory of the voice communication device.For example, in response to a name mentioned on the communicationsession, the communication device may present information it stores onthe name, for example a telephone number. In some embodiments of theinvention, the information is provided from a database, website or othersource which is not affiliated with the communication device or withsoftware controlling the signal recognition.

An aspect of some embodiments of the invention relates to displaying toone or more participants of a multi-participant conversation, additionalinformation accessed over the Internet or from a designated database ontopics discussed in the conversation, identified by applying speechrecognition to the conversation. Optionally, the additional informationincludes dictionary, encyclopedia or map information on terms used inthe conversation.

In some embodiments of the invention, an additional information deviceprovides automatically a list of additional information pages suggestedresponsive to the conversation. The human user selects from the list,the data that is to be downloaded or otherwise accessed. In someembodiments of the invention, the selections of the human users are usedto further adjust the data provided to users based on speechrecognition. For example, the positions of items in the list may beadjusted according to the popularity of the item or of the type of item(e.g., map, dictionary entry, price comparison). Alternatively oradditionally, highly popular items or types of items may be displayedautomatically with the list or instead of the list.

In some embodiments of the invention, the additional information isdownloaded from a plurality of unrelated Internet sites.

An aspect of some embodiments of the invention relates to automaticallydisplaying content selected responsive to results of image recognitionof a video stream or of still images. Optionally, the content isdisplayed in real time, within less than 10 minutes or even less than aminute from the image recognition.

In some embodiments of the invention, the video stream is a real timevideo stream, such that the content is displayed within less than 15minutes or even less than two minutes from acquiring the portion of thevideo stream causing the selection of the displayed content. Optionally,the video stream comprises a video conferencing video stream.Alternatively or additionally, the video stream comprises a stream froma surveillance camera, a traffic camera or any other monitoring videocamera. The images acquired by the monitoring camera are optionallyprovided to a monitoring station in addition to being processed forcontent selection, such that the content selection does not requireadditional cameras. In some embodiments of the invention, the content isdisplayed near or at the location of the captured images or videostream. Alternatively, the content selected responsive to the imagerecognition is displayed in a different location than the images. In anexemplary embodiment of the invention, the information selectedresponsive to images acquired by a monitoring camera is displayed onpublic terminals like electronic billboards designed to presentadvertisement clips.

While in some embodiments of the invention people in the acquired imagesare aware that content is being selected responsive to the acquiredimages, in other embodiments people in the acquired images are not awareof the provision of content in response to the image recognition.

Optionally, the displayed content comprises advertisements selectedresponsive to the images or video stream. For example, if the imagerecognition identifies a traffic jam, the display may showadvertisements for train services or other traffic avoidance measures,while when the road is clear it displays advertisements for safedriving. Alternatively or additionally, the sizes of the advertisementsand/or the amount of details used depends on the amount of traffic onthe road and/or the average speed.

In an exemplary embodiment of the invention, advertisements aredisplayed at a location viewable by people appearing in the acquiredvideo stream. Optionally, the advertisements are selected responsive toattributes of the imaged people. For example, when a fat person isidentified passing by the camera, an advertisement for a diet may bedisplayed.

An aspect of some embodiments of the present invention relates to usinga semantic map in matching content to a word or a word sequence. Wordsfrom the sequence are searched for in the semantic map and content ismatched to the words, by a processor, based on their location in themap. In some embodiments of the invention, the semantic map is partiallyor entirely organized in a hierarchy.

In some embodiments of the invention, the word sequence is collectedusing speech recognition. Alternatively, the word sequence is taken froma text document. Further alternatively, the word sequence is taken fromsub-vocal signals. Alternatively, the word sequence is taken from avisual reading of lips movement, that may not be accompanied by sound.

In some embodiments of the invention, the hierarchic semantic map istaken from an Internet site or database external to the contentprovision system, such as Wikipedia. Alternatively, the hierarchy isdefined specifically for the content provision.

The provided content may include advertisements and/or suggested keywords for representing the word sequence.

An aspect of some embodiments of the present invention relates to usinga hierarchic word map in aiding speech recognition. Optionally, thewords preceding or surrounding an unclear word are placed on the map,together with possibilities of interpreting the unclear word. Thepossibility to represent the unclear word is selected at least partiallyresponsive to the distances between the possibilities and the precedingor surrounding words on the map.

An aspect of some embodiments of the present invention relates to amethod of providing information to a mobile terminal such as a cellularphone, a wireless application protocol (WAP) unit or a wireless LANunit, responsive to speech recognition of speech signals on a voicechannel of the mobile terminal.

In some embodiments of the invention, the content is provided on achannel separate from the voice channel carrying the speech signals,while the voice channel is carrying the speech signals.

An aspect of some embodiments of the invention relates to displayingadvertisements on a screen of a mobile station, for example instead of ascreen saver, responsive to detection of whether an earphone isconnected to the mobile station. In some embodiments of the invention,advertisements are only displayed when an earphone is connected to thecellular phone, as when there is no earphone used during a telephoneconversation, the screen of the mobile station is generally not viewableby a human user. In some embodiments of the invention, advertisementsare only displayed when the mobile station or the host device isdetected to be in a preferable state for the users attention, such aswhen downloading a small file or when waiting for a call to be answered.Alternatively, the mobile station tracks the amount of time in whichadvertisements were displayed while the earphone is utilized and/orwhile the mobile station is in a preferable state and the advertiser isbilled accordingly. Further alternatively or additionally, the durationof displaying advertisements is adjusted responsive to whether themobile station is connected to an earphone and/or whether the mobilestation is in a preferable state.

An aspect of some embodiments of the invention relates to apparatuswhich provides content responsive to sub-vocal speech, electromyographyor visual lip reading. The content may include, for example,advertisements, dictionary, encyclopedia, map or other informationexpansion on a word or term and/or detail information.

An aspect of some embodiments of the invention relates to planting amicrophone and/or video camera in a location in which people areexpected to view advertisements or a shop setting. Sounds and/or imagesacquired by the microphone and/or camera are analyzed to collectfeedback on the advertisements. Optionally, the sound and/or videostream are automatically analyzed to determine portions which may berelevant to the advertisement, and these portions are provided to ahuman for further analysis. Alternatively, the sound of the portions ofinterest is automatically transcribed using speech recognition methodsand the transcription and/or an analysis thereof is provided to thehuman for further analysis. In some embodiments of the invention, thecamera captures the impressions of people viewing the advertisements orshop settings. Optionally, the images are automatically analyzed usingface recognition.

In some embodiments of the invention, the shop setting comprises a website providing an electronic shop.

In some embodiments of the invention, feedback is collected by thehardware of the user's communication device, such as an integratedcamera in a cellular phone or webcam connected to a PC. Optionally,feedback information can be collected while the user is exposed tocommercial content while using his communication device. Optionally, thecommercial content presented to the users on which feedback is collectedcan originate from any source known in the art and/or may be inaccordance with any of the embodiments described herein.

An aspect of some embodiments of the invention relates to a speechrecognition unit which is configured to access a web site or a databasefor aid in the speech recognition. In some embodiments of the invention,the speech recognition unit is configured to access a web site orexternal database not associated with a manufacturer of the speechrecognition unit. Optionally, the speech recognition unit is configuredto access a plurality of different web sites. Optionally, the accessedweb site provides a word list or search results which aid in worddisambiguation.

There is therefore provided in accordance with an exemplary embodimentof the invention, a method of providing information, comprisingproviding a communication session of at least one of audio and videomedia, applying automatic recognition to media transferred on thecommunication session, selecting an advertisement, by a processor, basedon the automatic recognition, selecting non-advertisement information,by the processor, responsive to the automatic recognition and conveyingthe selected advertisements and the selected non-advertisementinformation, during the communication session.

Optionally, providing the communication session comprises providing acommunication session of a single human, with an automatic opposite endterminal. Alternatively or additionally, providing the communicationsession comprises providing a communication session between a pluralityof human users. Optionally, providing the communication sessioncomprises providing a communication session with at least one instantmessaging software end point. Optionally, providing the communicationsession comprises providing a communication session with at least onemobile station end point.

Optionally, applying automatic recognition to media transferred on thecommunication session comprises applying the automatic recognition by anend terminal of the communication session. Optionally, applying theautomatic recognition by an end terminal of the communication sessioncomprises applying the automatic recognition with aid from a remoteserver. Optionally, selecting non-advertisement information comprisesselecting a dictionary or encyclopedia entry on a term identified on thecommunication session by the automatic recognition. Optionally,selecting non-advertisement information comprises selecting a maprelated to a term identified on the communication session by theautomatic recognition.

Optionally, selecting non-advertisement information comprises selectingthe non-advertisement information also responsive to information on ahuman participant in the session, which information was collected by theprocessor before the communication session began.

Optionally, conveying the advertisement and non-advertisementinformation comprises displaying both the advertisement andnon-advertisement information on a single display. Optionally, conveyingthe advertisement and non-advertisement information comprises displayingon a display of an end terminal of the communication session.Optionally, conveying the advertisement and non-advertisementinformation comprises conveying at least one of the advertisement andnon-advertisement information audibly. Optionally, conveying theadvertisement and non-advertisement information comprises conveying atleast one of the advertisement and non-advertisement information to ahuman user at least 30 minutes after it is selected. Optionally,selecting the non-advertisement information comprises selectinginformation from a web site of user generated content. Optionally,selecting the non-advertisement information comprises selecting atranscription or translation of audio signals passing on the connection.Optionally, selecting the non-advertisement information comprisesselecting information in a local memory associated with the processor.Optionally, selecting the non-advertisement information comprisesselecting information retrieved from a remote server.

There is further provided in accordance with an exemplary embodiment ofthe invention, a communication terminal, comprising a communicationinterface configured to conduct at least one of audio and video mediacommunication sessions with remote terminals, a display and a processorconfigured to apply automatic recognition to media transferred throughthe communication interface, and to select both advertisement andnot-advertisement information responsive to the recognition and todisplay the selected information on the display.

Optionally, the communication interface comprises a wireless interface.

There is further provided in accordance with an exemplary embodiment ofthe invention, a method of providing information, comprising providing acommunication session of at least one of audio and video media, applyingautomatic recognition to media transferred on the communication session,selecting information, by a software running on a processor, based onthe automatic recognition and downloading the selected information fromat least one web site not associated with a manufacturer of thesoftware.

Optionally, selecting the information comprises displaying a list of thepossible information selected responsive to the automatic recognitionand receiving an indication of an entry in the list. Optionally,selecting the information comprises selecting responsive to selectionsof information by human users in previous communication sessions.Optionally, downloading the selected information from at least one website comprises downloading from a plurality of non-associated web sites.

There is further provided in accordance with an exemplary embodiment ofthe invention, a method of providing information, comprising providing acommunication session of at least one of audio and video media, applyingautomatic recognition to media transferred on the communication session,displaying a list of possible information selected responsive to theautomatic recognition, receiving an indication of an entry in the listand displaying the information corresponding to the entry of the listpointed to by the received indication.

There is further provided in accordance with an exemplary embodiment ofthe invention, a method of providing information, comprising providing avideo stream, applying automatic image recognition to the video stream,selecting information responsive to the image recognition and displayingthe selected information within ten minutes from performing the imagerecognition leading to selection of the displayed information.

Optionally, displaying the selected information comprises displayingwithin ten minutes from acquiring images of the video stream leading tothe selection of the information, by a video camera. Optionally,providing the video stream comprises providing a video stream acquiredby a portable video camera. Optionally, providing the video streamcomprises providing a video stream acquired by a camera included in amobile communication terminal. Optionally, displaying the selectedinformation comprises displaying advertisements.

There is further provided in accordance with an exemplary embodiment ofthe invention, a method of matching information to a word sequence,comprising providing a semantic map, receiving a word sequence, by aprocessor, determining locations of words of the sequence in thesemantic map, selecting information for the word sequence, at leastpartially responsive to the locations of the words of the sequence inthe map and displaying the selected information.

Optionally, providing the semantic map comprises providing a map from aweb site of user provided content. Optionally, selecting the informationcomprises providing a list of keywords and corresponding information fordisplay and selecting a keyword from the list based on its distance inthe semantic map from words of the sequence. Optionally, receiving aword sequence comprises receiving a word sequence received from a speechrecognition unit.

There is further provided in accordance with an exemplary embodiment ofthe invention, a wireless communication terminal, comprising acommunication session interface configured to conduct at least one ofaudio and video media communication sessions with remote terminals, overa wireless link, a display and a processor configured to perform mediarecognition on media passing through the communication interface, toselect information responsive to the recognition and to display theselected information on the display.

Optionally, the terminal includes a server interface configured forcommunicating with a media recognition server, the server interfaceadapted to operate in parallel with the communication session interface,and wherein the processor is configured to perform at least one of mediarecognition, selection of information and display of the selectedinformation utilizing information received over the server interface.Optionally, the processor is configured to download the selectedinformation through the server interface while the communication sessioninterface is carrying a media session.

There is further provided in accordance with an exemplary embodiment ofthe invention, a wireless communication terminal, comprising acommunication session interface configured to conduct at least one ofaudio and video media communication sessions with remote terminals, overa wireless link, a display, and a processor configured to displayadvertisements on the terminal responsive to a state of the wirelessterminal. In some embodiments of the invention, the wireless terminalhas an earphone interface and the processor is configured to displayadvertisements responsive to whether an earphone is coupled to theearphone interface. Optionally, the processor is configured to displayadvertisements on the display only when an earphone is coupled to theearphone interface. Optionally, the processor is configured to track theamount of time that advertisements were displayed while an earphone wascoupled to the earphone interface.

There is further provided in accordance with an exemplary embodiment ofthe invention, a method of providing a person with information,comprising acquiring sub-vocal speech from a person, identifying wordsin the acquired sub-vocal speech and displaying content to the personresponsive to the identified words. Optionally, acquiring the sub-vocalspeech comprises acquiring by electrodes near the vocal chords of theperson.

There is further provided in accordance with an exemplary embodiment ofthe invention, a method of acquiring feedback signals on displayadvertisements or presented information, comprising applying mediarecognition to the acquired signals generated in response toadvertisements and/or presented information. Optionally, advertisementsand/or presented information may or may not be provided by an externalsource. Optionally, acquired feedback is used for commercial and/orstatistical objects and/or for improving media recognition and/ormatching advertisements and/or additional information.

There is further provided in accordance with an exemplary embodiment ofthe invention, a method of acquiring feedback on a commercial setting,comprising providing a commercial setting, positioning a camera ormicrophone directed at a position from which individuals can view thecommercial setting, acquiring media signals by the camera or microphone,applying media recognition to the acquired signals and generatingfeedback on the commercial setting having a storage size of less than10% of a size of the acquired media signals.

Optionally, generating the feedback comprises providing portions of themedia signals including words which may relate to the commercialsetting. Optionally, providing the commercial setting comprisesproviding an e-shop.

BRIEF DESCRIPTION OF THE DRAWING

The present invention will now be described in the following detaileddescription of exemplary embodiments of the invention and with referenceto the attached drawing, in which dimensions of components and featuresshown are chosen for convenience and clarity of presentation and are notnecessarily shown to scale. Generally, only structures, elements orparts that are germane to the discussion are shown in the figure.

FIG. 1 is a schematic illustration of an information provision system,in accordance with an exemplary embodiment of the invention;

FIG. 2 is a schematic view of an implementation with a Wi-Fi phonesystem, in accordance with an exemplary embodiment of the invention;

FIG. 3A is a schematic illustration of a portion of a semantic map 500,used in accordance with an exemplary embodiment of the invention;

FIG. 3B is a flowchart of a computerized method of creating a semanticmap 500, in accordance with an exemplary embodiment of the invention;

FIG. 4A is a flowchart of acts performed in disambiguation of speechrecognition results, in accordance with an exemplary embodiment of theinvention;

FIG. 4B is a flowchart of acts performed in selecting contentcorresponding to a speech session, in accordance with an exemplaryembodiment of the invention;

FIG. 5A is a schematic view of usage of a speech recognition system inan advertisement environment, for example in front of a shop window, inaccordance with an exemplary embodiment of the invention;

FIG. 5B is a schematic view of usage of an automatic speech recognition(ASR) with younger population, in accordance with an exemplaryembodiment of the invention; and

FIG. 6 is a schematic illustration of an information display system, inaccordance with an exemplary embodiment of the invention.

DETAILED DESCRIPTION OF SOME EMBODIMENTS Overview

FIG. 1 is a schematic illustration of an information provision system100, in accordance with an exemplary embodiment of the invention. System100 receives input audio signals through an input interface, such as amicrophone 104A, or a communication link interface 109. In someembodiments of the invention, system 100 also includes a camera 104Bthrough which it receives still or video images. A signal recognitionunit 108 receives the input sound and/or image signals and converts thespeech and/or images into text 102 and/or identifies in the signalswords, terms, emotions and/or other attributes 107. A matching unit 110selects content to be provided to a user in response to text 102 and/orattributes 107. The selected content is presented on a display 112 to auser.

Input

In some embodiments of the invention, the input signals are acquiredthrough a microphone 104A of system 100. The audio signals may beacquired by a software associated with system 100 or by a software of adifferent task, such as an instant messaging software, running onapparatus implementing system 100. Alternatively, recognition unit 108may receive audio signals acquired by a separate device, optionally at aremote location. For example, recognition unit 108 may be part of acommunication device which receives audio signals from a remotelocation, for example as part of a real time communication session(e.g., a telephone call). The communication device may comprise a linetelephone, a cellular terminal, personal digital assistant (PDA) or anyother mobile terminal or a computer, such as one running an immediatemessaging process. In an exemplary embodiment of the invention, thecommunication device comprises an HP 6515 PDA. In some embodiments ofthe invention, communication session passes over a cellular network, aWiFi or WiMax network or any other wireless network. In still otherembodiments of the invention, the audio and/or image signals arereceived through a direct wire, cable or wireless connection from a TV,radio or other receiver.

In some embodiments of the invention, the acquired audio signals aregenerated by a human user without relation to their provision torecognition unit 108. Possibly, the human user is not aware that theaudio signals are provided to recognition unit 108. Optionally, theaudio signals provided to recognition unit 108 are provided to anadditional unit, such as a speaker device which sounds the audio signalsand/or to a telephone connection leading the signals to a remotelocation.

Alternatively or additionally to acquiring audible sound signals,recognition unit 108 receives sub-vocal signals, such as described in USpatent publication 2006/0129394 to Becker et al., in US patentpublication 2007/0106501 to Morita et al. and/or in “Subvocal SpeechDemo”, downloaded fromhttp://www.nasa.gov/centers/ames/news/releases/2004/subvocal/subvocal.html,the disclosures of all of which are incorporated herein by reference.Optionally, the sub-vocal signals are collected using electrodes on theskin above the vocal chords of a person. Alternatively or additionally,sub-vocal signals are collected using an electrode included in anearphone 129 collecting sub-vocal signals passing through the skinand/or by a camera or other sensor directed at the vocal chords of theindividual whose sub-vocal signals are being monitored.

In other embodiments of the invention, camera 104B is used to read thelips of a human speaker. In still other embodiments of the invention,system 100 receives text input, optionally in addition to audio signals.For example, if system 100 is implemented on a cellular telephone, itmay receive in addition to audio signals of telephone conversations alsoSMS text. The text messages are optionally provided directly to matchingunit 110 for selection of content to be displayed responsive to thetext.

In some embodiments of the invention, the input signals are real timesignals provided to recognition unit 108 immediately after they aregenerated. The signals may belong, for example, to a telephone orinstant messaging conversation, to a face-to-face conversation in thevicinity of system 100 or elsewhere or to sound signals coming from aradio, television, computer, CD player, MP3 player or any otherappliance which generates speech signals (including songs). In anexemplary embodiment of the invention, recognition unit 108 operates ona lecture and system 100 provides background material responsive tokeywords recognized in the lecture. Furthermore, recognition unit 108may identify non-speech sounds, such as noises of home appliances, carsor pet animals.

In other embodiments of the invention, recognition unit 108 receivessignals from a prestored file, for example music files or lectureslocated on the Internet or in a local directory of a computer or mobilestation hosting system 100. When operating on non-real-time files,system 100 may operate in a single pass, or may perform two or morepasses, identifying the main points of the file and providing contentaccordingly.

For example, system 100 may operate on user-generated media-contentdisplayed in a web site, such as “youtube”. The content provided bysystem 100 may be displayed by the website along side the user contentor may be accessible by user's applying a suitable control.Alternatively or additionally, the content provided by system 100 isused by the management of the website in monitoring the user generatedcontent. Alternatively or additionally, the captured input signals orfeedback signals of the user are used by the management of the website,and/or by sponsors of the website.

Recognition

In an exemplary embodiment of the invention, recognition unit 108comprises a speech recognition engine of any type known in the art,which converts the speech content into text. The speech recognition isoptionally performed using any method known in the art, including humanspeaker dependent methods in which samples of the human speaker's speechare taken in advance, and human speaker independent methods. The speechrecognition may be performed using the most sophisticated and processingintensive methods, but this is not necessary in all the embodiments ofthe invention. In fact, in some embodiments of the invention, a lightweight speech recognition process is used, which can be easily mountedon low processing power mobile stations. While such process may have anerror rate which is too high for transcription of an entirecommunication session, it is sufficient to provide advertisements and/oradditional information related to the session at a relatively highaccuracy level.

In some embodiments of the invention, the results of the speechrecognition are verified in a spell check dictionary 119 or other wordpool (e.g., an encyclopedia), in order to remove mistakes. Optionally,the dictionary or other word pool is a network based pool providingservice to a plurality of recognition units 108, such as a web basedword pool. In some embodiments of the invention, the word pool ismanaged by an entity not associated with system 100. Using a web basedword pool makes the speech recognition software more compact and allowssimple update of the word pool. In some embodiments of the invention,the web based server is distanced by at least a kilometer fromrecognition unit 108. Optionally, the dictionary or other word poolprovides user based content. Alternatively or additionally to using aword pool, terms are searched for on the web and if the term has a lowhit rate it is replaced with a similar term having a much higher hitrate.

Recognition unit 108 optionally interprets the entire content of theaudio signals it receives. Alternatively, recognition unit 108 onlyinterprets some of the signals received, for example by finding specificwords from a dictionary and/or specific keywords associated withavailable information to be provided. Alternatively or additionally,recognition unit 108 identifies emotions, accent or other beyond-textinformation in the audio signals, for example any of those described inthe above mentioned PCT patent publication WO 2007/026320. Optionally,the identified emotions are provided to matching unit 110, separatelyfrom the text. Alternatively or additionally, the identified emotionsare implanted in the text in the form of predetermined symbolsdesignating the different emotions. For example a sequence ofexclamation marks can indicate excitement.

In some embodiments of the invention, the text is analyzed to determineemotions of the human speaker. For example, the analysis may identifycurse words, repetitions and/or broken sentences as a sign of anger.

The acquired audio signals are optionally generated by a single source,for example in a lecture, from a conversation between a human and anautomatic answering system (e.g., voice mail) or signals collected froma single side of a conversation. Alternatively, acquired audio signalsare of a conversation of a plurality of people. In some embodiments ofthe invention, recognition unit 108 does not differentiate between thepeople participating in the conversation. Alternatively, recognitionunit 108 additionally performs speaker recognition, to by determiningspeech attributes (e.g., frequencies used, pitch) of speech portions anddetermining whether they match speech attributes of other speechportions.

In some embodiments of the invention, recognition unit 108 storeslocally speech attributes of users of system 100 and/or of people whosespeech was previously handled by system 100. The stored attributes areoptionally used to differentiate between speakers in a conversationand/or to aid in the speech recognition and/or in the matching ofcontent by matching unit 110. Alternatively or additionally, a centraluser registry 117 stores speech attributes and/or samples for users,such that different recognition units 108 on different locations can useuser information accumulated by other systems 100. Optionally, usersinstalling software of recognition unit 108 and/or using system 100, arerequested to enroll with registry 117 and provide speech samples to aidin recognizing their speech. Alternatively or additionally, users areencouraged during speech recognition sessions to enter the name ofspeakers whose speech is recognized. The names are uploaded to registry117 along with speech samples and/or speech attribute information forfurther use. In some embodiments of the invention, at the beginning of aconversation, one of the users indicates to system 100 the names of someor all of the participants, to aid recognition unit 108 in identifyingthe users and their attributes.

Registry 117 may be configured to limit the use of uploaded data only tothe system 100 uploading data or only to a group of systems with whichit is associated. Alternatively, registry 117 is configured to shareuploaded data between many systems 100.

Optionally, during a conversation recognition unit 108 determines whichindividual voiced each speech portion. For portions not matching anyknown users, recognition unit 108 optionally assigns a general tag, suchas “speaker 1”, and further portions belonging to the same speaker areassociated with this tag. As mentioned above, the speaker and/or anyoneknowing the name of the speaker may insert the name to the system.

Optionally, during a conference call, recognition unit 108 identifiesthe person currently speaking based on voice attributes, for example bycomparing to a library of voice attributes of the participants in theconference call and/or using a neural network. In an exemplaryembodiment of the invention, the names of the identified persons aredisplayed on one or more of the terminals participating in theconference call to aid the participants in identifying the currentspeaker. In some embodiments of the invention, the names of theparticipants are displayed together with other information associatedwith the named participant, such as an image, portrait, personal avatar,alias, online profile and/or web persona. The other information may alsobe displayed instead of the name or with a pseudo name, for example whenthe name should be kept secret. Alternatively to providing the name ofthe speaker, recognition unit 108 tags the different speakers witharbitrary tags, such as “speaker 1”, “speaker 2”. Instead of totallyarbitrary tags, the tags used are chosen responsive to the voice orspeech attributes of the user. For example, a fast speaker may be taggedas “fast speaker” and a person with a deep voice may be tagged “deepvoice”. Other exemplary tags could include, for example, “the fastearlier”, “the lady” and “nervous guy”.

In some embodiments of the invention, during the conference call theparticipants can enter the name of the person identified by any of thetags, so that thereafter the name is used instead of the tag. The nameand voice attributes may be saved for further conference calls conductedat later times.

Alternatively or additionally, recognition unit 108 receives indicationsas to the source of the speech at every time point, for example, basedon signals received from the communication terminals participating inthe session.

In some embodiments of the invention, system 100 determines from thesound signals the distance from the microphone through which they wereacquired to their source, and thus differentiates between signals fromdifferent sources. Optionally, an array of microphones is used todetermine the locations of different sources.

In some embodiments of the invention, the knowledge on the identity ofthe speaker is used in simplifying the speech recognition. For example,user registry 117 may include for some or all of the users sample wordsand their interpretation, specific accent information and/or voiceattributes which aid in the speech recognition. The information may becollected at an enrollment stage, during previous speech sessions of thespeaker and/or continuously during the current speech recognitionsession.

Matching

In some embodiments of the invention, matching unit 110 provides atranscription of the audio signals. Alternatively or additionally,matching unit 110 translates the speech recognized audio signals intoone or more languages other than the original language.

Additional Information

In some embodiments of the invention, matching unit 110 providesadditional information, beyond that included in the input audio signals,about words or subjects identified in the audio signals. Matching unit110 optionally manages a correlation database 127 which correlatesbetween words and the content to be provided if the word is identifiedin the text. Alternatively or additionally, correlation database 127 ismanaged on a central server accessible for example over the Internet.

The additional information optionally comprises non-commercialinformation. Optionally, the additional information is not sponsored byan entity interested in distributing the information and there is noentity that paid for distribution of the information.

Optionally, for geographical names, a map is provided showing thementioned location. Words considered rare optionally initiate display ofa dictionary interpretation of the word. Alternatively or additionally,words or phrases having a corresponding encyclopedia entry initiatedisplay of the entry.

Alternatively or additionally, a set of rules governing which display isprovided for each word is defined. The rules are optionally customizableby a user. In an exemplary embodiment of the invention, if the word is aplace on a map, the map is displayed. Otherwise, if the word appears inan encyclopedia, the encyclopedia entry is displayed. If neither on amap or in an encyclopedia, matching unit 110 optionally determineswhether the word is a rare word, for example by counting its occurrencein a web search. If the word is rare, a dictionary entry for the word isdisplayed. In some embodiments of the invention, the information isprovided from Internet sites, such as online dictionaries, encyclopediasand map sites.

In some embodiments of the invention, the user is asked to choosebetween different suggestions, meanings or corrections. For example, ifthe city Springfield is mentioned, the user may be requested to chooseamong the multiple cities having that name, and/or the option “show cityclosest to you” accounting for personal registry information.

Instead of each word involving display of a single type of information,some words may involve displaying a plurality of pieces of information,for example both a map and a weather forecast for a city. Alternativelyor additionally, the information to be displayed is selected responsiveto a context of the discussion. For example, if any of a list of wordsrelating to the weather (e.g., rain, hot, humidity) were mentioned inthe audio signals within a predetermined time, a weather forecast isdisplayed for locations and otherwise a map is displayed. As anotherexample, if the word train or bus was mentioned, a time-table isoptionally displayed for transportation to the location. In someembodiments of the invention, the decision also or alternatively dependson the extent of correlation between the user personal information(e.g., registry information, or information stored locally on the user'sdevice, like a file on his computer, or information gathered fromprevious or current session), for example the information to bedisplayed is chosen according to the distance between the named locationand the location of the user.

In some embodiments of the invention, the information to be displayed isselected responsive to non-speech attributes and/or content of the inputsignals, such as coughing or background traffic noise. Alternatively oradditionally, the information to be displayed is selected in response toan analysis of the text used by the speaker, such as the language levelused by the speaker. In some embodiments of the invention theinformation to be displayed is selected at least partially based onpersonal information the speaker says about himself. This informationmay be retrieved using grammar inflection, such as by identifyingsentences beginning with “I am”, and using their content. Alternatively,the information to be displayed is selected based on personalinformation provided by the speaker at enrollment.

Many other information displays may be provided, such as pricecomparisons for products mentioned in the audio signals and/or a list ofpatents or other documents related to the word. In some embodiments ofthe invention, a web search for a word or term is provided, for exampleusing any of the methods described in U.S. Pat. No. 7,027,987 to Franzet al., issued Apr. 11, 2006, the disclosure of which is incorporatedherein by reference.

In some embodiments of the invention, some types of information areprovided only for words repeated in the input audio signals at least apredetermined number of times and/or otherwise considered a basiccomponent of the received audio signals, for example because other wordsof a related subject are included in the received audio signals.

Optionally, the user can configure system 100 with rules governing whichinformation is to be displayed.

In some embodiments of the invention, correlation database 127 isdynamically adjustable by feedback received from users. The feedbackoptionally includes indication of a level of usefulness of displayedinformation, an indication of whether more or less detail is desiredand/or indication of desired information not provided. Optionally, insubsequent sessions, the information provided is selected based on thefeedback from the specific user. Alternatively, the information providedin subsequent sessions is adjusted according to the accumulated feedbackfrom all users or based on feedback from a sub-group of users to whichthe user belongs. In some embodiments of the invention, the effect ofthe feedback on information provided to a user is adjusted according tothe feedback in a manner that the feedback of users more similar to theuser has more weight. Alternatively or additionally, feedback providedrecently, for example during the current communication connection, isgiven more weight than older feedback.

In an exemplary implementation of this embodiment, a logical model formatching relevant content is generated for a group of users havingcommon attributes and/or belonging to a mutual club. In another example,feedback is collected or specifically categorized if received fromfrequent online shoppers of books. Focusing on a specific group ofpeople with shared attributes may facilitate recognition. Also, thesimilarity of the members of such groups may aid perfecting selection ofrelevant information to content to display. Furthermore, the communal orsocial activities of the group might facilitate commercially targetingpeople with the same tendencies, or facilitate collecting statisticalfeedback of only individuals that are commercially relevant to theadvertiser.

In an exemplary implementation of this embodiment, information, whetherraw (i.e., the original transcript) or processer, can be use to augmentor expend any database, that can be part of system 100, or external tothe system, and/or belonging to an outside source or third party. Thisinformation can be used for surveys or any statistic study or research.By collecting the input information and/or recording it and/ordocumenting it, the system can channel not only relevant output to theuser but also relevant data to possible beneficiaries, or for anypurpose that might advantage a large statistical sample that can alsoutilize the system 100 abilities in order to further analyze and/orcategories bodies of information. Alternatively or additionally, thestatistical information can be factored into a semantic map or logicalmodel or algorithm that the system is using or is based upon, in orderto improve on it or keep it constantly updated. Alternatively oradditionally, the collected statistical information can be used byrefining or improving the methods used by the system, for any further orfuture or consecutive sessions, or can be used also for the currentsession, recalculating the semantic map or logic model or algorithm inreal-time. Optionally, generated statistical information, or informationprocessed and/or sorted by the system can either be used in such a wayas described for objects of the system itself and/or for objects ofother systems or processes.

Advertisements

The content may include, in some embodiments of the invention,advertisements selected responsive to words used in the discussion. Theadvertisements are optionally directed at inducing, promoting and/orencouraging purchase of products and/or services and/or acceptance ofideas. Optionally, the user to which the advertisements are displayed isnot charged for display of the advertisements, but rather the providerof the advertisements is charged for their display.

The selection of the advertisements may be performed using any methodknown in the art, such as used by the Adsense software and/or thatdescribed in U.S. patent publication 2007/0186165 to Maislos et al.,published Aug. 9, 2007, the disclosure of which is incorporated hereinby reference. In some embodiments of the invention, system 100 displaysboth advertisements and other information. Optionally, the otherinformation is displayed only if the user agrees to displaying theadvertisements. In addition to aiding in receiving agreement to thedisplay of advertisements, the combined display of information with theadvertisements increases the chances that the user will view and/or payattention to the advertisements which are displayed near or aresurrounded by targeted information.

Alternatively or additionally, other methods are used to increase thechances that advertisements are viewed by the user. Optionally, when theadvertisements are displayed on a screen of a mobile station during aconversation, the advertisements are displayed only when the mobilestation is connected to an earphone, such that it is expected that theuser is not holding the screen against his car in a manner whichprevents viewing the advertisements. Alternatively or additionally,system 100 keeps track of the time advertisements were displayed whilethe mobile station is connected to an earphone. In some embodiments ofthe invention, advertisements are displayed on a terminal of a user onlywhen the user is silent, or more weight is given in billing theadvertiser for advertisement time when the user is silent.

In some embodiments of the invention, advertisements, or otherinformation, are displayed only when determined that a user is in thevicinity of the screen on which the data is displayed. For example,display 112 may be associated with a presence sensor, such as a camera,which indicates when a user is near the display and/or looking at thedisplay. In an exemplary embodiment of the invention, display 112 isassociated with a proximity sensor which shuts down the display when tooclose to a user.

Optionally, advertisements are displayed especially when thecommunication device is in a preferable state in which it is loading asoftware or downloading a file from the Internet and/or is limited inuse for other tasks. In some embodiments of the invention, system 100keeps track of the time advertisements are displayed in a preferablestate.

In some embodiments of the invention, advertisements, or otherinformation, are displayed responsive to detected specific conditions,for example if a monitoring system is collecting images that attest to atraffic jam or a red light in the next intersection, advertisements aredisplayed on electronic billboards. Optionally, billing is determinedaccounting to the specific conditions where users or targeted audienceis more susceptible to advertisements.

Optionally, microphone 104A collects audio signals generated for apredetermined time during and/or after displaying the advertisement, asfeedback on the advertisement. In some embodiments of the invention, thecollected audio signals are passed to the advertiser or other entity formanual analysis. Alternatively, the collected audio signals from thetime of the advertisement display are analyzed to find references to theadvertisement, based on keywords in the advertisement and/or a searchfor the word “advertisement” or related words. In some embodiments ofthe invention, the advertiser is billed according to the number of timeswords which may be related to the advertisement are collected bymicrophone 104A, during a predetermined time after display of theadvertisement. In some embodiments of the invention, the billing is alsoadjusted according to the average number of times these words appear inconversations in which the advertisement is not displayed. Optionally,in these embodiments, a list of words which may be related to theadvertisement is defined and system 100 counts the number of times oneor more of these words is collected. Other methods of assessing theexposure of users to advertisements may be used, such as counting theusers that follow a link provided by the advertisement.

Further Input to Aid Selection of Information

In some embodiments of the invention, the information to be displayed isselected based on further information beyond the contents of the inputaudio signals. Such information may include, for example, time or date,user location (e.g., based on GPS readings), user socio-demographicinformation (e.g., age, gender), user interests and/or information fromprevious sessions of the user. The information may be collected from thecurrent terminal selecting the display and/or based on information fromother terminals not involved in the current information display.

Other information can be derived not just from words or phrases but alsofrom indications or cues that can present themselves indirectly, or froma more general analysis, possibly of a large portion of theconversation. Such information may include, for example, repetition of acertain term, or the consistency of a certain subject. Another examplecan be the type and/or quality of the relationship of the conversingusers, such as a professional relationship or of a romantic inclination.This example can provide helpful information on the nature of thecurrent session and also on each individual participating. By speakingin a certain terminology or by revealing affection, there is much thatcan be known about the user, and accordingly the information displayedto the user may be selected by matching unit 110. Additionally, a usermay expose information about himself, particularly his personality orstate of mind or convictions or beliefs or any indications regardingwhat content is best relevant for him, or most appropriate in thisspecific time. A user, for example, can speak assertively, or be shyabout his choice of words. A user, as a further example, can use harshphrases frequently to express anger. Users can also excessively talk inmodern slang, which can imply they are young. By being also attentive tosuch information, the system improves on the profile of the user,producing more exact or desirable results, whether temporarily (onlyrelevant currently) or in further or later use of the system.

In some embodiments of the invention, the process of matching content ordetermining the most appropriate result to output is based or assistedor supported by additional characteristics of the users or cues in theconversation that are not necessarily literal or expressed in thetextual transcription or in the lingual sequences. Such information mayinclude, for example, indications of the qualities or characteristics ofthe sound of the voice of the user, such as pitch, tension and/ortremor, whispers and/or shouts, loudness, stutters or laughter and/orany recognition of audible expressions that can attest to the user'sstate or contribute to assessing the best match of content or mostappropriate provided information for this user.

Content Display

The information selected by matching unit 110 is displayed or otherwisepresented to the user. Alternatively, the selected information is addedto a list from which the user can select desired information to bedisplayed. In some embodiments of the invention, when the user terminalhas a large cache, some or all of the listed information is cached onthe terminal, to allow the user immediate view when selecting an itemfrom the list. When not cached, bandwidth is not wasted on downloadinginformation that the user is not interested in. This option isparticularly useful for wireless devices or other devices having limitedbandwidth. Providing the list, however, allows the user to easily selectthe desired information without wasting time on searching and/ordefining the desired information.

In an exemplary embodiment of the invention, the information isdisplayed on a computer, for example in parallel to conducting aninstant messaging voice conversation. In some embodiments of theinvention, the display is carried out by a screen saver software. Forexample, the screen saver may display information and/or advertisementsin response to words in conversations in the vicinity of the computer.In other embodiments of the invention, the information is displayed onthe screen of a mobile device, such as a cellular telephone.

In other embodiments of the invention, information is displayed publiclyin response to signals collected from a specific area. For example,information selected responsive to conversations taking place in ashopping mall may be displayed in a display window of a shop of themall, possibly in a display window facing the specific area from whichthe conversations are collected.

It is noted that in addition to displaying the information on the userterminal or alternatively to displaying on the user terminal, theinformation may be displayed on a different device, possibly in adifferent location. This alternative may be used, for example, to allowpeers in a social network to keep updated about each other.

In some embodiments of the invention, sound and/or video content isdisplayed to the user responsive to the recognized audio signals. Forexample, background music related to words in the conversation may besounded to the user. In an exemplary embodiment of the invention, songsassociated with recurring words appearing in the recognized audiosignals are sounded. In another example, background music matching themood of a user participating in an input conversation, is sounded.

Charges

In some embodiments of the invention, users are charged for theinformation service. Optionally, the user is charged a one time sum forinstallation of a software, which governs the display of theinformation. Alternatively or additionally, the user is charged a fixedmonthly rate. Further alternatively or additionally, the user is chargedaccording to the amount of information for which he requests furtherinformation.

In other embodiments of the invention, the users are not charged for theinformation provided, but rather the cost of providing the informationis covered by displaying advertisements. Alternatively or additionally,the information is provided by a service provider which benefits fromthe increase in traffic due to the provision of the information to theclient. In some embodiments of the invention, the communication servicesof the user are subsidized, for example for large volume use of theinformation service and/or for display of advertisements.

Architecture

In some embodiments of the invention, the tasks of recognition unit 108,matching unit 110 and display 112 are performed in a single location,possibly by a single processor or a single terminal. Alternatively, oneor more of the tasks is outsourced to a separate, possibly remote,apparatus or system. For example, some or all of the signals may betransmitted to a remote recognition unit for speech recognition.Alternatively or additionally, matching unit 110 is located on a remotecentral server. The recognized, transcribed and/or translated text istransmitted to the remote server and information to be displayed oraddresses from which to retrieve the information, are returned.Alternatively, as described above, the selection of content is performedlocally, but the content itself is retrieved from a remote server, forexample over the Internet.

The display may also be in a different location from the input locationand/or recognition unit 108. For example, the input signals may beacquired and converted into text on a cellular phone, and the text istransmitted to a user computer where it is matched to content which isdisplayed on the screen of the computer.

The selected information may be provided to the user immediately duringthe conversation, or may be supplied to the user at a later time, forexample when the information requires substantial amount of reading timeand may distract the user from the conversation. Optionally, the userchooses when and/or how the selected information will become availableto him or be presented to him.

In an exemplary embodiment of the invention, system 100 is implementedon a home entertainment center, providing also television shows. Bycollecting signals associated with the use of the home entertainmentcenter, system 100 records TV shows that were matched to the user'sinterests, even if he/she is not presently using the system. Optionally,advertisements added by system 100 are presented when the user watchesthe recorded content. In an exemplary embodiment of the invention,system 100 identifies keywords and/or dominant pictures appearing inprograms watched by the human user. Alternatively or additionally,system 100 identifies the interests of the user from conversations ofthe human user. The programs that are recorded may be identified from aTV guide or any other index, or may be stored online and then analyzedto determine whether their content matches the profile of the userinterests.

In an exemplary embodiment of the invention, system 100 is implementedon a cellular phone and provides content responsive to conversations ofthe cellular phone with other telephones. While a telephone connectionis in operation, the cellular phone manages an additional dataconnection with a central server of system 100 which aids in performingone or more of the tasks of system 100.

FIG. 2 is a schematic view of an implementation with a Wi-Fi phonesystem, in accordance with an exemplary embodiment of the invention. Abase system 300 is connected to the internet. In addition it isconnected, for example through a cable 302, to a Wi-Fi router 304 thatcommunicates over a wireless link 306 using the 802.11x protocol withnearby devices such as a Wi-Fi phone 310. Alternatively, Wi-Fi phone 310is connected to base system 300 through a USB cable 308. Wi-Fi phone 310can use either integrated microphone 320 or a cord-free earphone such asa Bluetooth earphone 314 having microphone 316 to capture sound waves.In addition or alternatively, an integrated camera 322 of phone 310 oran integrated camera of earphone 318 can be used to capture images.Captured audio signals and/or images are than processed as describedabove regarding FIG. 1 to create content presented on the display screenof phone 310. The user of phone 310 may have an incentive to use system100 by having free calls in exchange for being presented 112 withcommercial content. In addition the phone user might be paid for eachcommercial content presented on the phone. Alternatively oradditionally, advertisements are provided together with useful contentwhich the user considers of value.

Semantic Map

In some embodiments of the invention, a semantic map is used to aid inspeech recognition and/or in content selection, as is now described.

FIG. 3A is a schematic illustration of a portion of a semantic map 500,used in accordance with an exemplary embodiment of the invention. FIG.3A shows a main topic 502, having three sub-topics 512, 514 and 516,which in turn have sub-items 522, 524 and 526. Each node in semantic map500 represents a word or term and its sub-nodes represent words whichare sub-units of the parent node. For example, the term “human voice”may have the sub-topics: “talk”, “song” and “laughter”. The nodes areconnected to related sub-nodes by branches, marked 531-537.

FIG. 3B is a flowchart of a computerized method of creating a semanticmap 500, in accordance with an exemplary embodiment of the invention.Beginning from a starting point word (583), a categorized wordcollection, such as Wikipedia, is accessed to find (550) the wordtherein. Related words having a narrower meaning are retrieved (552)from the word collection and added (554) to map 500 as sub-topics of theword. This process is repeated for the sub-topics, until the map iscompleted. In some embodiments of the invention, the map comprises atree in which each node has only a single parent. Alternatively, the mapis not a tree and nodes may relate to a plurality of different parentterms.

In some embodiments of the invention, the method of FIG. 3B is performedusing a single categorized word collection. Alternatively, the method isperformed using a plurality of word collections.

System 100 may generate a single semantic map 500 for repeated use withspeech recognized words. Alternatively, a separate semantic map isgenerated for each speech session or file.

Alternatively to computerized generation of the semantic map, thesemantic map may be created partially or entirely manually.

FIG. 4A is a flowchart of acts performed in disambiguation of speechrecognition results, in accordance with an exemplary embodiment of theinvention. During a speech recognition session, for each recognizedword, a counter associated with the entry of the word in the semanticmap is incremented. Optionally, when a word does not clearly match asingle word, the speech recognition provides (602) a list of words whichmay have been intended by the speaker. For each of the words in thelist, a score is calculated (606) based on the distance between thecorresponding entry of the word and adjacent entries whose counter wasincremented and/or the number of occurrences of adjacent entries in thespeech session. In an exemplary embodiment of the invention, the scoreis equal to minus the shortest distance in branches from the twoneighboring words from the left and right side of the suggested word.The word in the list having the highest score is optionally used (608).

FIG. 4B is a flowchart of acts performed in selecting contentcorresponding to a speech session, in accordance with an exemplaryembodiment of the invention. A predetermined database correlates betweenidentified words and content to be displayed responsive thereto. Given arecognized word, matching unit 110 determines (620) whether the wordappears in the database. If the word does not appear in the database,semantic map 500 is searched (621) for neighboring words in the map,which do appear in the database. The words found in the database aregiven (624) a ranking score and accordingly content for display isselected (626).

In some embodiments of the invention, the ranking score is a function ofthe distance, measured for example as the number of branches separating,in the semantic map, the nodes representing the original word and theword found in the database. Alternatively or additionally, the rankingscore depends on the number of words in the speech session in thevicinity of the ranked word in the semantic map. Further alternativelyor additionally, each of the branches (531-537) is assigned a weight Wi,which indicates a measure of the relation between the nodes connected bythe branch. The distance between two nodes is optionally a function ofthe weights of the branches connecting the nodes, for example theproduct of the weights of the branches connecting the nodes. Thedistance between two nodes connected in a plurality of different pathsis optionally a function, such as average, maximum or minimum, of theweights of the different paths.

In some embodiments of the invention, the weights of the branches aredetermined automatically from an external database, such as Wikipedia,by determining the strength of the relation between the two terms. Forexample, the strength of the relation may be determined based on thenumber of times the terms of the nodes are mentioned together and/orbased on the number of users moving from one to the other in Wikipedia.

In some embodiments of the invention, the semantic map is hierarchical,in other embodiments, the semantic map is non hierarchical, and in otherembodiments it is not hierarchal.

In some embodiments of the invention, the semantic map is used for itemstaken from a speech session, in other embodiments it is used with visualobjects and in other embodiments it is used for text items.

In some embodiments of the invention the semantic map comprises textobjects and/or sound objects and/or visual objects.

In some embodiments of the invention the semantic map is used for webbased advertising such as Google (www.google.com) advertisement.Currently, an advertiser purchases advertisements for specific words.Using the semantic map, the advertiser can buy the word, and all thewords close to it in the semantic map (for example, within a certainnumber of branches of separation from that word), or a word and all thewords that are below it in a hierarchical semantic map.

Other Uses

FIG. 5A is a schematic view of usage of a speech recognition system inan advertisement environment, for example in front of a shop window, inaccordance with an exemplary embodiment of the invention. A microphone806 is implanted in proximity to a point where people 802 are expectedto view advertisements or other commercial setups (e.g., a shop windowsetting). Microphone 806 captures comments 804 of viewers of thecommercial setup and passes the captured audio signals to a computer808, which optionally converts the acquired speech into text and/orotherwise processes the acquired signals for review by the shop owner oradvertiser. Responses of viewers may be collected by sensors other thanmicrophone 806, such as a camera or a touch sensor. The microphone maybe placed in a hidden location or may be easily viewed in order toencourage responses.

In some embodiments of the invention, audio or video responses to thecontent of an electronic-shop displayed over the Internet are collected.Optionally, the responses are collected through microphones and/orcameras of computers of home users viewing the e-shop. Responses areoptionally collected from many users and analyzed to find entries ofinterest, for example by finding statements that include words relatingto the content of the c-shop. In some embodiments of the invention, thecontents of the audio responses are automatically converted into textand/or automatically aggregated to provide a short summary of theresponses to an owner of the e-shop.

FIG. 5B is a schematic view of usage of an automatic speech recognition(ASR) with younger population, in accordance with an exemplaryembodiment of the invention. A child 820 would say a word 822 describingthe object he wants to see. The word 822 is captured by a microphone 824and sound is processed by a computer 830, for example a laptop computer,that does the ASR and presents the relevant picture 828 on screen 826.The system could make an educational game in which toddlers learn how topronounce nouns. For example a toddler would learn how to pronounce theword dog as once he pronounces it right a picture of a dog is presented.

The speech recognition abilities may be used for transcription and/ortranslation of speech signals in voice and/or video messages.

Another possible example is in a slightly different configuration inwhich the text is sent to another handset. For example one user couldleave a voice mail to another user and the user would receive it is as atext message, for example an SMS.

In an exemplary embodiment of the invention, any of the methodsdescribed above may be used in determining elements in a computer gameor an interactive environment. Optionally, a game involving playerspeech may analyze the player speech and accordingly select acontinuation of an adventure game. For example, a nervous playeraccording to his speech may be given an easier task than more confidentplayers.

Another possible example is in collecting input signals from webaliases, profiles, accounts or personas that are exclusively virtual butbelong to a user. For example, if a user registers a character inSecondLife, he might conduct a voice conversation with other charactersin SecondLife. Such conversation may be speech recognized for thebenefit of advertisers, SecondLife administrators, a community ofSecondLife characters and/or of the specific user.

In an exemplary embodiment of the invention, system 100 is used toprovide information to customers calling a service center for technicalsupport or ordering products. According to an analysis of theconversation, an automatic response may be sent to the customerapologizing for a long waiting period or providing additionalinformation. The automatic response may be provided by email or may beprinted and sent along with an ordered product. For example, along withan ordered pizza, system 100 may automatically generate a customizedadvertisement printed on the pizza (or other product) casing, based onthe conversation with the service or ordering center.

Optionally, the analysis of conversations is used also to collectstatistical information from a plurality of conversations. For example,the statistics may include the frequencies of use of various words andthe distributions of talking attributes in different age groups.

In an exemplary embodiment of the invention, the non-advertisementinformation is provided in order to allow collection of the statisticalinformation.

In an exemplary embodiment of the invention, system 100 is part of asecurity network or other closed networks such as a business network. Asecurity network might be sponsored by advertisement and providecaptured signals of the crowd for both the advertisers and for securityobjects. A professional network might facilitate the communicationbetween colleagues, for example by utilizing collected signals to bettersupply each worker with the relevant information from inside the companydatabase, like finding similar projects in other departments. Theprofessional network might be connected to a specific database that isrelevant to the workers, or is manageable by administration, likedisplaying reminders on bathroom screens to wash their hands, ordisplaying useful information about changes in the corporation.Optionally, workers will benefit from the service of system 100 ofmatching relevant information that will be presented additionally tomanagement information, and/or information designated by administration.

Images

While the above description emphasized providing content in response toaudio signals, any of the above embodiments may be implemented in asimilar manner based on matching information to images. Optionally,images captured by camera 104B or received from any other source aresearched to identify known images and information and/or advertisementsare matched to the identified known images. For example, if an image ofan iPod is identified, an encyclopedia entry on the iPod may be providedand/or an advertisement for add-ons to the iPod is presented. In anotherexample, the appearance of a person might assist in matching relevantcontent, such as the clothes the person is wearing, how much makeup theperson is using, an estimation of an age group of the person and/or amood of the person.

The image recognition may be performed using any method known in theart, such as those described in US patent publication 2006/0012677 toNeven et al., published Jan. 19, 2006, US patent publication2007/0175998 to Lev, the disclosures of which are incorporated herein byreference.

FIG. 6 is a schematic illustration of an information display system 940,in accordance with an exemplary embodiment of the invention. System 940comprises a camera 952 which acquires images, optionally a video streamof images. The images are provided to an image recognition unit 954which identifies objects of interest in the images. Information on theidentified objects and optionally their attributes is forwarded to aninformation selection unit 958 which determines accordingly whatinformation is to be displayed on a display 956, using any of the abovedescribed methods.

In an exemplary embodiment of the invention camera 952 may monitorpeople, such as person 942, standing near display 956 and selectadvertisements for display 956 according to attributes of the people.For example, advertisements directed to a child audience may bedisplayed when image recognition unit 954 identifies a large percentageof children in the images acquired by camera 952. Alternatively to beingdirected at a location from which display 956 is viewed, camera 952 mayview an entrance to a shop or other closed area in which display 956displays advertisements or other information. The advertisementsdisplayed are optionally selected according to the average profile ofpeople entering the shop.

In some embodiments of the invention, the advertisements are selectedresponsive to behavior against rules identified in the images of camera952. For example, when a camera monitoring a printer at a work placeidentifies misuse of the printer it may show on display 956 a warningand/or use instructions.

Camera 952 is stationary, in some embodiments. In other embodiments ofthe invention, camera 952 is a portable camera, possibly mounted on amobile communication terminal. In these embodiments, display 956 isoptionally the display of the mobile terminal. Alternatively, display956 is separate from the mobile terminal, which periodically transmitsinformation selection instructions to the display. In some embodimentsof the invention, camera 952 stores the selected information until themobile terminal is connected to a base computer. Camera 952 may also bemounted on home and/or office appliances, such as refrigerators.

In some embodiments of the invention, the images from camera 952 areadditionally provided to a monitoring station 950. Thus, camera 952 isused for two different tasks and the cost of camera hardware is reduced.In some embodiments of the invention, installation of system 940 isfinanced by the advertisements.

CONCLUSION

It will be appreciated that the above described methods may be varied inmany ways, including, changing the order of steps, and/or performing aplurality of steps concurrently. It will also be appreciated that theabove described description of methods and apparatus are to beinterpreted as including apparatus for carrying out the methods andmethods of using the apparatus. The present invention has been describedusing non-limiting detailed descriptions of embodiments thereof that areprovided by way of example and are not intended to limit the scope ofthe invention. Many specific implementation details may be used.

It should be understood that features and/or steps described withrespect to one embodiment may sometimes be used with other embodimentsand that not all embodiments of the invention have all of the featuresand/or steps shown in a particular figure or described with respect toone of the specific embodiments.

It is noted that some of the above described embodiments may describethe best mode contemplated by the inventors and therefore may includestructure, acts or details of structures and acts that may not beessential to the invention and which are described as examples.Structure and acts described herein are replaceable by equivalents whichperform the same function, even if the structure or acts are different,as known in the art. Variations of embodiments described will occur topersons of the art. Therefore, the scope of the invention is limitedonly by the elements and limitations as used in the claims, wherein theterms “comprise,” “include,” “have” and their conjugates, shall mean,when used in the claims, “including but not necessarily limited to.”

What is claimed is:
 1. A wireless communication terminal, comprising: anearphone housing; a mount mechanically connected to said housing forholding said earphone housing in place close to a wearer's ear; amicrophone adapted to capture a plurality of sound waves; an integratedcamera adapted to capture a plurality of images; a local processor whichprocesses said plurality of sound waves and said plurality of images; acommunication link interface for transmitting an outcome of processingsaid plurality of images and said plurality of sound waves by said localprocessor to a mobile device via a wireless communication network so asto allow an application installed on said mobile device to process saidoutcome; wherein said earphone housing contains said microphone, saidintegrated camera, and said local processor.
 2. The wirelesscommunication terminal of claim 1, wherein said application communicateswith a remote system in response to processing said outcome.
 3. Thewireless communication terminal of claim 2, wherein said applicationcommunicates with said remote system via a public cellular network. 4.The wireless communication terminal of claim 2, wherein said applicationanalyzes at least one of said plurality of images and said plurality ofsound waves to identify content and communicates with said remote systemfor acquiring data related to said content.
 5. The wirelesscommunication terminal of claim 4, wherein said application comprises aspeech recognition module adapted to convert said plurality of soundwaves to text and to identify said content accordingly.
 6. The wirelesscommunication terminal of claim 1, wherein said outcome to said saidmobile device via a Bluetooth communication.
 7. The wirelesscommunication terminal of claim 1, further comprising a module forreceiving GPS readings.
 8. The wireless communication terminal of claim1, wherein said mobile device is a cellular phone.
 9. The wirelesscommunication terminal of claim 1, wherein said application detects anevent captured in at least one of said plurality of images by ananalysis and instruct the displaying of said content in a display ofsaid mobile device.
 10. The wireless communication terminal of claim 9,wherein said event is a traffic situation selected from a groupconsisting of a traffic jam, amount of traffic on a road, no traffic ona road, and average speed.
 11. The wireless communication terminal ofclaim 9, wherein said content comprises an advertisement.
 12. Thewireless communication terminal of claim 9, wherein said content isacquired in response to said detection.
 13. The wireless communicationterminal of claim 9, wherein said displaying comprises adjusting a sizein which said content is displayed in said display according to saidevent.
 14. The wireless communication terminal of claim 9, wherein saiddisplaying comprises selecting an amount of details from said content topresent in said display according to said event, said displayingcomprises displaying said selected amount of details.
 15. The wirelesscommunication terminal of claim 9, further comprising identifying alocation of the wireless communication terminal; wherein said displayingcomprises selecting at least some of said content according to saidlocation.
 16. The wireless communication terminal of claim 1, whereinsaid application analyzes said plurality of images to read lips of aspeaker imaged in said plurality of images.
 17. The wirelesscommunication terminal of claim 1, wherein said integrated camera islocated to image an area in front of a wearer.
 18. A method of wirelesscommunication, comprising: capturing a plurality of sound waves and aplurality of images using a microphone an integrated camera of anearphone device having a housing mechanically connected to said earphonedevice for holding said earphone device in place close to a wearer's earprocessing said plurality of sound waves and said plurality of imagesusing a processor of said earphone device; transmitting an outcome ofprocessing said plurality of images and said plurality of sound waves bysaid processor to a mobile device via a wireless communication networkso as to allow an application installed on said mobile device to processsaid outcome.